AITopics | neural attention

Latent Alignment and Variational Attention

Neural Information Processing SystemsMar-17-2026, 05:28:04 GMT

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Latent Alignment and Variational Attention

Neural Information Processing SystemsNov-20-2025, 22:52:53 GMT

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

hard attention, latent alignment and variational attention, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models

DiGiugno, Andrew, Mahmood, Ausif

arXiv.org Artificial IntelligenceFeb-24-2025

Transformer models typically calculate attention matrices using dot products, which have limitations when capturing nonlinear relationships between embedding vectors. We propose Neural Attention, a technique that replaces dot products with feed-forward networks, enabling a more expressive representation of relationships between tokens. This approach modifies only the attention matrix calculation while preserving the matrix dimensions, making it easily adaptable to existing transformer-based architectures. We provide a detailed mathematical justification for why Neural Attention increases representational capacity and conduct controlled experiments to validate this claim. When comparing Neural Attention and Dot-Product Attention, NLP experiments on WikiText-103 show a reduction in perplexity of over 5 percent. Similarly, experiments on CIFAR-10 and CIFAR-100 show comparable improvements for image classification tasks. While Neural Attention introduces higher computational demands, we develop techniques to mitigate these challenges, ensuring practical usability without sacrificing the increased expressivity it provides. This work establishes Neural Attention as an effective means of enhancing the predictive capabilities of transformer models across a variety of applications.

dot-product attention, matrix, neural attention, (14 more...)

arXiv.org Artificial Intelligence

2502.17206

Country: North America > United States > Connecticut > Fairfield County > Bridgeport (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Latent Alignment and Variational Attention

Deng, Yuntian, Kim, Yoon, Chiu, Justin, Guo, Demi, Rush, Alexander

Neural Information Processing SystemsFeb-14-2020, 20:44:26 GMT

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.

hard attention, latent alignment and variational attention, neural attention, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

How to Generate Text from Images with Python

#artificialintelligenceOct-15-2019, 09:25:09 GMT

In the Google Search: State of the Union last May, John Mueller and Martin Splitt spent about a fourth of the address to image-related topics. They announced a big list of improvements to Google Image Search and predicted that it would be a massive untapped opportunity for SEO. SEO Clarity, an SEO tool vendor, released a very interesting report around the same time. Among other findings, they found that more than a third of web search results include images. Images are important to search visitors not only because they are visually more attractive than text, but they also convey context instantly that would require a lot more time when reading text.

caption, image url, neural network, (16 more...)

#artificialintelligence

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Neural Attention: Machine Learning Meets Neuroscience

#artificialintelligenceOct-5-2016, 09:20:55 GMT

Neural attention has been applied successfully to a variety of different applications including natural language processing, vision, and memory. An attractive aspect of these neural models is their ability to extract relevant features from data, with minimal feature engineering.Brian Cheung is a PhD Student at UC Berkeley working with Professor Bruno Olshausen, as well as an Intern at Google Brain. By drawing inspiration from the fields of neuroscience and machine learning, he hopes to create systems which can solve complex vision tasks using attention and memory. At the Deep Learning Summit in Singapore, Brian will share expertise on the fovea as an emergent property of visual attention, ways we can extend this ability to learning interpretable structural features of the attention window itself, and finding conditions where these emergent properties are amplified or eliminated providing clues to their function. I asked him a few questions ahead of the summit to learn more.

artificial intelligence, deep learning, machine learning, (13 more...)

#artificialintelligence

Country: